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Abstract 

With the ansatz that a data set's correlation matrix has a certain 
parametrized form (one general enough, however, to allow the arbitrary spec- 
ification of a slowly-varying decorrelation distance and population variance) 
the general machinery of Wiener or optimal filtering can be reduced from 
0(n 3 ) to 0(n) operations, where n is the size of the data set. The implied 
vast increases in computational speed can allow many common sub-optimal 
or heuristic data analysis methods to be replaced by fast, relatively sophis- 
ticated, statistical algorithms. Three examples are given: data rectification, 
high- or low- pass filtering, and linear least squares fitting to a model with 
unaligned data points. 

PACS numbers: 06.50.-x, 02.50.Rj, 02.50.Vn 
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In a previous analysis of irregularly spaced observations of the gravitationally lensed 
quasar 0957+561 we have used to good effect the full machinery of Wiener (or optimal) 
filtering in the time domain, including the use, where appropriate, of unbiased ("Gauss- 
Markov") estimators 0. These, and related, techniques are general enough to be applicable 
to data that is not only irregularly sampled (including the notoriously common case of 
"gappy" data), but also to data that is highly inhomogeneous in its error bars, or with 
point-to-point correlations (so that errors are not independent). The principal reason that 
these methods are not better known and more widely used seems to be the fact that their use 
entails the numerical solution of sets of linear equations as large as the full data set. Only 
recently have fast workstations allowed application to data sets as large as a few hundred 
points; sets larger than a few thousand points are currently out of reach even on the largest 
supercomputers, since the computational burden for n data points scales as n 3 . As an 
example, the analysis in (leading to a measurement of the offset in time of the two radio 
images of the lensed quasar) required overnight runs on a fast workstation. 

In this context, we were therefore quite surprised recently to notice that the introduction 
of a particular simplifying assumption (essentially the ansatz of a certain parametrized form 
of the data's correlation function) allows all the calculations already mentioned, and many 
more, to be done in linear time, that is, with only a handful of floating operations per data 
point. In fact we have verified that we are able to obtain results substantially identical to |2j 
in less than 3 seconds of computer time, about 10 4 times faster than the previous analysis. 

Speed increases of 10 4 or greater (that is, from 0[n 3 } to 0[n] for n data points) are not 
merely computer time savers. Such increases are enabling for the application of sophisticated 
statistical techniques to data sets that hitherto have been analysed only by heuristic and 
ad-hoc methods. The Fast Fourier Transform (FFT) is a previous example of a numerical 
algorithm whose raw speed caused it to engender a considerable universe of sophisticated 
applications. By their nature, FFT methods are generally not applicable to irregularly 
sampled, or otherwise inhomogeneous, data sets (though see 0). Although the methods 
we describe here are not related to the FFT in a mathematical sense, we think they have 
the potential to be comparably significant in engendering new and powerful techniques of 
data analysis. In the interest of making such new methods available to the widest possible 
community, we outline, in this Letter, the mathematical foundation of the class, and give 
three examples of early applications. We will also make available, via the Internet ||, a 
"developer's kit" of Fortran-90 code, fully implementing the examples given here. 

We begin with the observation that many, if not most, one-dimensional processes of 
interest (e.g., measurements as a function of time t) have a characteristic decorrelation time 
(which may itself vary with time), so that a set of measurements at the ordered times 
ti, i = 1, ...,n, have an expected (population) correlation matrix that is peaked on the 
diagonal i — j and decays away from the diagonal in both directions. We consider the case 
where this decay can be modeled, even if only roughly, by the form 

(1) 

where w(t), the reciprocal of the decorrelation time, can be thought of a slowly varying with 
time (or constant). All our results derive from the remarkable fact that the inverse of the 
matrix (1), = Ty, is tridiagonal with 
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Ti = exp 



ti+l 
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It is well known P,[7| that tridiagonal systems can be solved in linear time, requiring 8n 
arithmetic operations. A first surprising result is therefore that the operation of multiplying 
the non-sparse matrix $ by any data vector y, 



j 

can be rendered fast by instead solving (for /) the tridiagonal problem 

$Z Tijlj = Hi 



(5) 



(6) 



The matrix $ by itself is not a very good correlation matrix, since it is normalized 
to unity on the diagonal. This is easily remedied by the introduction of another (generally 
slowly varying or constant) function V(t) to represent the population variance of the process, 
that is, the typical or a priori mean square amplitude for a measurement y at time t. Our 
correlation matrix ansatz is then 



which has the (tridiagonal) inverse 



(7) 



(8) 



(The function V is not to be confused with the error or noise in a single measurement, which 
need not be slowly varying, see below.) 

In fact, equations (l)-(6) represent just one example that derives from the more general 
fast evaluation of "forward plus backward" integrals with the form 



h — Fi + Bi 



1 n 



(9) 



where the forward and backward pieces have individual exponentially decaying recurrences, 

(10) 



F i+1 = F^i + fi 



3 



where the fi (or bj) is the forward (or backward) increment across the single interval (ti, tj+i). 
The key observation is that I/s can be found from the /'s and 6's (without the intermediate 
calculation of F's and -B's) by solving the tridiagonal system 

(F 1 + e 1 (b 1 /r 1 -f 1 ) i = l 

Y^ T ij!j = \ -h-i) + e^bi/n - fi) 2<i<n-l (11) 

The special case of eq. (6) is recovered by 

/i = (Vi+i + nyi)/2 k = [yi + ny i+1 )/2 (12) 

However, other choices for fi and bi, e.g., corresponding to quadrature formulas across the 
interval (ti, U + i), have other uses, as we will see below. 

This is all the machinery we need for the three examples that we now give. The idea 
that certain special correlation matrices can have simple inverses is not new H-Ifll, but we 



are not aware of the previous use in data processing of tridiagonally fast forms as general as 
eqs. (l)-(4), (7)-(8), or (9)-(12). 

Example 1: Wiener filtering and data rectification. Here, one is given an irregularly 
sampled data set jji = y(ti), i = 1, . . . , n, with error estimates <jj. The oVs may be highly 
variable, with well-measured values intermixed with poorly-measured ones. One desires 
best estimates (in the sense, with some technical assumptions, of minimum variance, see [3]) 
of the underlying signal s(t) either at the measured times ti (Wiener filtering), or else at 
some different, usually equally spaced, set of times t'j, j = 1, . . . ,m (data rectification). In 
the real world, data rectification is now often accomplished by linear interpolation between 
nearest measured points. Such interpolation can give highly sub-optimal results, since it can 
use a poorly measured near point instead of a much better measured point only negligibly 
farther away. The procedure described here uses, in effect, an optimal combination of 
points, weighting them appropriately by their combination of nearness to the desired point 
and smallness of their measured error. 

One proceeds as follows. Step 1.1: estimate the inverse decorrelation length w(t) and 
the population variance V(t). These estimates need not be very accurate, since the error in 
the result of Wiener filtering is second order in any error in the filter. It generally suffices 
to use constant values w and V. 

Step 1.2: Form an "augmented" vector of times ti, i = 1, . . . , N, consisting of the union 
of the times at which data is measured and the times at which output is desired. (If there 
are no overlaps, then N = m + n.) Form a corresponding augmented data vector y^, with 
measured data values in the appropriate slots, an arbitrary constant value (generally the 
mean of the measured values) in the other slots. Form an augmented "reciprocal error" 
vector by similarly combining the the measured l/o* values (in the measured slots) with a 
value zero in the unmeasured slots. 

Step 1.3: Calculate the output of the Wiener filter, by using the right-hand form of the 
matrix equation 

s = S T [S + NpV* = [N- 1 + S- 1 ]- 1 ^* (13) 

Here N _1 is the diagonal matrix formed from the square of the reciprocal error vector 
N^ 1 = (1 /of) Sij while S is the correlation matrix (called C in eq. 7, above), fully specified 
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by the sets Vi, Wi, t; L . In particular, since S _1 is tridiagonal, right-hand term in brackets is 
also tridiagonal, and s can be obtained by a single fast tridiagonal solution. 

Step 1.4: Unpack the desired results from s, either the values at the measured tj's, or 
the values at the rectified t-'s, or both. 

Example 2. Low- or High-pass Filtering of Irregularly Spaced Values. Here, for brevity, 
we assume that the data is error free. (A more complicated example would combine the 
optimal estimation of Example 1 with the filtering described here.) At first sight, a matrix 
like eq. (1) does not look like a very good low-pass filter, because the discontinuity in slope 
on the diagonal introduces significant high-frequency leakage. An important trick, however, 
is to allow w to be complex, but take the filter to be the real part of the result. Then, there 
is a unique exponential filter with all the following properties: (i) no discontinuity in the 
derivative of the impulse response, (ii) frequency response unity at / = and flat through 
the third derivative, (iii) response falls off as f~ 4 (amplitude), or 24 dB per octave (power), 
for / > f c , where f c is the 3 dB cutoff frequency. Using this exponential, the filter can be 
derived from eq. (11) by the application of a simple quadrature rule to the intervals between 
the points. 

Step 2.1: Calculate the complex values 

Wj = K(l + i)f c \t j+1 -tj\ j = 1, . . . , n - 1 (14) 

where K = 5.53807 for a low-pass filter, K = 3.56427 for a high-pass filter. (The different 
values simply scale the respective 3 dB points to f c .) For each Wj, calculate r 3 - = exp[— Wj] 
(replacing eq. 3). 

Step 2.2: Solve the complex tridiagonal system 

^TijUj = - x I ( Si - s i+1 )/Wi + (si - Si_i)/Wi_i i = 2,...,n-l (15) 
i " [ (s n - s„_i)/W„_i % = n 

where the Sj's are the input values. 

Step 2.3: The outputs of the high- or low-pass filters, at the locations of the input data, 
are given respectively by 

Hs = Re(u) £s = s - Re(u) (16) 

(It is worth remarking that, by a different choice of complex exponential, one can similarly 
approximate Gaussian convolution and deconvolution.) 

Example 3. Linear least squares fitting a model, where the data points and model points 
are not available at the same positions. This is just one of many possible extensions of 
Example 1. One desires the coefficient vector q that best fits a linear combination of model 
basis functions. 

Step 3.1 is the same as Step 1.1, above. Step 3.2 is similar to step 1.2. Model (as opposed 
to data) slots should, however, be filled with zero values for both and N. Additionally 
form an augmented matrix L* each of whose columns contains the values of a single model 
basis function (in rows corresponding to the model slots) or zero (in rows corresponding to 
the data slots). 

For Step 3.3, as in Step 1.3, note that the inverse 
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CT 1 = (S + N)~ l = s _1 [i + ns^ 1 ]- 1 



(17) 



can be calculated by a single tridiagonal solution, followed by an (also fast) tridiagonal left 
multiplication by S _1 . Thus the coefficient matrix, which is given by 

q=[L:(C- 1 L,)]- 1 L:(C- 1 yJ (18) 

can be calculated by performing fast operations on the columns of L* and on y + . 

Step 3.4: Using the inverses already calculated for q, the x 2 f° r the fit is calculated as 

X 2 = (y, - L.q) r [(C7- 1 yJ - (C^L^q] (19) 

By repeating the calculations of eqs. (18) and (19), this quantity can be minimized with 
respect to any additional parameters in the fit that enter nonlinearly. (We have applied this 
procedure with great success to a problem where the nonlinear parameter is a time-scaling 
of the data, so that the positions of the measured data points are shifted and compressed or 
expanded with each iteration. Use of eqs. (18) and (19) avoids the necessity of recalculating 
the model at each iteration.) 

The examples given here are only the simplest of many related techniques that share 
the underpinnings of (i) non-sparse matrices with tridiagonal (or, more generally, band- 
diagonal or other fast) inverses, and (ii) judicious factorization to maintain fast evaluation 
at each step, as in eqs. (13) and (17). Single exponential forms are only the simplest of a 
larger family. For example, correlation matrices that are approximated by the sum of two 
exponential forms with tridiagonal inverses can be treated by the factorization 

(T' 1 + T 2 1 + N)- 1 = Tx(T 2 + Ti + T 2 NT 1 )" 1 T 2 (20) 

where Ti and T 2 are each tridiagonal, N is diagonal. The term in parentheses on the right 
is pentadiagonal, therefore also admitting fast inversion. Although the equations become 
more complicated, a sum of k exponential forms can be cast as a band-diagonal inversion 
with bandwith 2k + 1. 

This work was suppported in part by the National Science Foundation (PHY-91-06678). 
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